Tolerating Transient Faults in Statically Scheduled Safety-Critical Embedded Systems
نویسندگان
چکیده
Static off-line scheduling ensures predictability of worstcase behavior and high resource utilization for safetycritical applications but lacks the flexibility needed to deal with run-time fault-tolerance. We present a temporal redundancy-based recovery technique that tolerates transient task failures in statically scheduled distributed embedded systems where tasks have timing, resource, and precedence constraints. Task failures are handled using precomputed contingency schedules that introduce adaptive fault tolerance into table-driven dispatchers. Failures are masked using the spare capacity on the affected processor and the recovery scheme requires no hardware overhead. Our approach combines the benefits of static scheduling with the run-time flexibility needed for fault tolerance in low-cost embedded systems. We present a method to obtain contingency schedules and prove its correctness. We also evaluate the effectiveness of the proposed method through simulation.
منابع مشابه
Mälardalen University
In this paper we present an approach to the designoptimization of fault-tolerant embedded systems for safety-critical applications. Processes are statically scheduledand communications are performed using the time-triggered protocol. We use process re-execution andreplication for tolerating transient faults. Our designoptimization approach decides the mapping of proc...
متن کاملTask Scheduling Algorithms for Fault Tolerance in Real-time Embedded Systems
We survey scheduling algorithms proposed for tolerating permanent and transient failures in real-time embedded systems. These algorithms attempt to provide low-cost solutions to fault tolerance, graceful performance degradation, and load shedding in such systems by exploiting tradeoffs between space and/or time redundancy, timing accuracy, and quality of service. We place fault-tolerant schedul...
متن کاملTolerance to Multiple Transient Faults for Aperiodic Tasks inHard Real - Time
Real-time systems are being increasingly used in several applications which are time-critical in nature. Fault tolerance is an essential requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems. Assuming earliest-d...
متن کاملEnergy-Aware Synthesis of Fault-Tolerant Schedules for Real-Time Distributed Embedded Systems
In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. Addressing simultaneously energy and reliability is especially challenging because l...
متن کاملScheduling and Optimization of Fault-Tolerant Distributed Embedded Systems
Safety-critical applications have to function correctly even in presence of faults. This thesis deals with techniques for tolerating effects of transient and intermittent faults. Reexecution, software replication, and rollback recovery with checkpointing are used to provide the required level of fault tolerance. These techniques are considered in the context of distributed real-time systems wit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999